Deep learning has been widely used in the perception (e.g., 3D object detection) of intelligent vehicle driving. Due to the beneficial Vehicle-to-Vehicle (V2V) communication, the deep learning based features from other agents can be shared to the ego vehicle so as to improve the perception of the ego vehicle. It is named as Cooperative Perception in the V2V research, whose algorithms have been dramatically advanced recently. However, all the existing cooperative perception algorithms assume the ideal V2V communication without considering the possible lossy shared features because of the Lossy Communication (LC) which is common in the complex real-world driving scenarios. In this paper, we first study the side effect (e.g., detection performance drop) by the lossy communication in the V2V Cooperative Perception, and then we propose a novel intermediate LC-aware feature fusion method to relieve the side effect of lossy communication by a LC-aware Repair Network (LCRN) and enhance the interaction between the ego vehicle and other vehicles by a specially designed V2V Attention Module (V2VAM) including intra-vehicle attention of ego vehicle and uncertainty-aware inter-vehicle attention. The extensive experiment on the public cooperative perception dataset OPV2V (based on digital-twin CARLA simulator) demonstrates that the proposed method is quite effective for the cooperative point cloud based 3D object detection under lossy V2V communication.
translated by 谷歌翻译
The processing and recognition of geoscience images have wide applications. Most of existing researches focus on understanding the high-quality geoscience images by assuming that all the images are clear. However, in many real-world cases, the geoscience images might contain occlusions during the image acquisition. This problem actually implies the image inpainting problem in computer vision and multimedia. To the best of our knowledge, all the existing image inpainting algorithms learn to repair the occluded regions for a better visualization quality, they are excellent for natural images but not good enough for geoscience images by ignoring the geoscience related tasks. This paper aims to repair the occluded regions for a better geoscience task performance with the advanced visualization quality simultaneously, without changing the current deployed deep learning based geoscience models. Because of the complex context of geoscience images, we propose a coarse-to-fine encoder-decoder network with coarse-to-fine adversarial context discriminators to reconstruct the occluded image regions. Due to the limited data of geoscience images, we use a MaskMix based data augmentation method to exploit more information from limited geoscience image data. The experimental results on three public geoscience datasets for remote sensing scene recognition, cross-view geolocation and semantic segmentation tasks respectively show the effectiveness and accuracy of the proposed method.
translated by 谷歌翻译
Computer vision applications in intelligent transportation systems (ITS) and autonomous driving (AD) have gravitated towards deep neural network architectures in recent years. While performance seems to be improving on benchmark datasets, many real-world challenges are yet to be adequately considered in research. This paper conducted an extensive literature review on the applications of computer vision in ITS and AD, and discusses challenges related to data, models, and complex urban environments. The data challenges are associated with the collection and labeling of training data and its relevance to real world conditions, bias inherent in datasets, the high volume of data needed to be processed, and privacy concerns. Deep learning (DL) models are commonly too complex for real-time processing on embedded hardware, lack explainability and generalizability, and are hard to test in real-world settings. Complex urban traffic environments have irregular lighting and occlusions, and surveillance cameras can be mounted at a variety of angles, gather dirt, shake in the wind, while the traffic conditions are highly heterogeneous, with violation of rules and complex interactions in crowded scenarios. Some representative applications that suffer from these problems are traffic flow estimation, congestion detection, autonomous driving perception, vehicle interaction, and edge computing for practical deployment. The possible ways of dealing with the challenges are also explored while prioritizing practical deployment.
translated by 谷歌翻译
面部地标检测是具有许多重要应用的非常基本和重要的愿景任务。在实践中,面部地标检测可能受到大量自然降级的影响。最常见和最重要的降解之一是光源阻塞引起的阴影。虽然已经提出了许多先进的阴影去除方法来恢复近年来的图像质量,但它们对面部地标检测的影响并不具备很好的研究。例如,它仍然不清楚阴影去除是否可以增强面部地标检测的鲁棒性,以与不同的阴影模式。在这项工作中,为了第一次尝试,我们构建了一个新颖的基准,以将两个独立但相关任务联系起来(即阴影去除和面部地标检测)。特别是,所提出的基准覆盖具有不同强度,尺寸,形状和位置的不同面孔阴影。此外,对于对面部地标检测的挤出硬影模式,我们提出了一种新的方法(即,普发的阴影攻击),这使我们能够构建基准的具有挑战性的综合分析。通过构造的基准,我们对三个最先进的阴影清除方法和三个地标检测器进行了广泛的分析。这项工作的观察激励我们设计一种新颖的检测感知阴影拆除框架,使暗影去除以实现更高的恢复质量并增强部署的面部地标检测器的阴影稳健性。
translated by 谷歌翻译
最近的2D-3D人类姿势估计工作倾向于利用人体骨架的拓扑形成的图形结构。但是,我们认为这种骨架拓扑太稀疏,无法反映身体结构并遭受严重的2D-3D模糊问题。为了克服这些弱点,我们提出了一种新颖的图表卷积网络架构,层次图形网络(HGN)。它基于我们的多尺度图结构建筑策略产生的密度图形拓扑,从而提供更精细的几何信息。所提出的架构包含三个并行组织的稀疏微小表示子网,其中通过新颖的特征融合策略处理多尺度图形结构特征,并通过新颖的特征融合策略进行交换信息,导致丰富的分层表示。我们还介绍了3D粗网格约束,以进一步提高与细节相关的特征学习。广泛的实验表明,我们的HGN通过减少的网络参数实现了最先进的性能
translated by 谷歌翻译
深面识别(FR)在几个具有挑战性的数据集上取得了很高的准确性,并促进了成功的现实世界应用程序,甚至表现出对照明变化的高度鲁棒性,通常被认为是对FR系统的主要威胁。但是,在现实世界中,有限的面部数据集无法完全涵盖由不同的照明条件引起的照明变化。在本文中,我们从新角度(即对抗性攻击)研究对FR的照明的威胁,并确定一项新任务,即对对抗性的重视。鉴于面部图像,对抗性的重新获得旨在在欺骗最先进的深FR方法的同时产生自然重新的对应物。为此,我们首先提出了基于物理模型的对抗重新攻击(ARA),称为反照率基于反击的对抗性重新攻击(AQ-ARA)。它在物理照明模型和FR系统的指导下生成了自然的对抗光,并合成了对抗性重新重新确认的面部图像。此外,我们通过训练对抗性重新确定网络(ARNET)提出自动预测性的对抗重新攻击(AP-ARA),以根据不同的输入面自动以一步的方式自动预测对抗光,从而允许对效率敏感的应用。更重要的是,我们建议将上述数字攻击通过精确的重新确定设备将上述数字攻击转移到物理ARA(PHY-AARA)上,从而使估计的对抗照明条件在现实世界中可再现。我们在两个公共数据集上验证了三种最先进的深FR方法(即面部,街道和符号)的方法。广泛而有见地的结果表明,我们的工作可以产生逼真的对抗性重新贴心的面部图像,轻松地欺骗了fr,从而揭示了特定的光方向和优势的威胁。
translated by 谷歌翻译
共同突出的对象检测(Cosod)最近实现了重大进展,并在检索相关任务中发挥了关键作用。但是,它不可避免地构成了完全新的安全问题,即,高度个人和敏感的内容可能会通过强大的COSOD方法提取。在本文中,我们从对抗性攻击的角度解决了这个问题,并确定了一种小说任务:对抗的共同显着性攻击。特别地,给定从包含某种常见和突出对象的一组图像中选择的图像,我们的目标是生成可能误导Cosod方法以预测不正确的共突变区域的侵略性版本。注意,与分类的一般白盒对抗攻击相比,这项新任务面临两种额外的挑战:(1)由于本集团中图像的不同外观,成功率低; (2)Cosod方法的低可转换性由于Cosod管道之间的差异相当差异。为了解决这些挑战,我们提出了第一个黑匣子联合对抗的暴露和噪声攻击(JADENA),在那里我们共同和本地调整图像的曝光和添加剂扰动,根据新设计的高特征级对比度敏感损失功能。我们的方法,没有关于最先进的Cosod方法的任何信息,导致各种共同显着性检测数据集的显着性能下降,并使共同突出的物体无法检测到。这在适当地确保目前在互联网上共享的大量个人照片中可以具有很强的实际效益。此外,我们的方法是用于评估Cosod方法的稳健性的指标的潜力。
translated by 谷歌翻译
The electrification of shared mobility has become popular across the globe. Many cities have their new shared e-mobility systems deployed, with continuously expanding coverage from central areas to the city edges. A key challenge in the operation of these systems is fleet rebalancing, i.e., how EVs should be repositioned to better satisfy future demand. This is particularly challenging in the context of expanding systems, because i) the range of the EVs is limited while charging time is typically long, which constrain the viable rebalancing operations; and ii) the EV stations in the system are dynamically changing, i.e., the legitimate targets for rebalancing operations can vary over time. We tackle these challenges by first investigating rich sets of data collected from a real-world shared e-mobility system for one year, analyzing the operation model, usage patterns and expansion dynamics of this new mobility mode. With the learned knowledge we design a high-fidelity simulator, which is able to abstract key operation details of EV sharing at fine granularity. Then we model the rebalancing task for shared e-mobility systems under continuous expansion as a Multi-Agent Reinforcement Learning (MARL) problem, which directly takes the range and charging properties of the EVs into account. We further propose a novel policy optimization approach with action cascading, which is able to cope with the expansion dynamics and solve the formulated MARL. We evaluate the proposed approach extensively, and experimental results show that our approach outperforms the state-of-the-art, offering significant performance gain in both satisfied demand and net revenue.
translated by 谷歌翻译
Deep convolutional neural networks have achieved great progress in image denoising tasks. However, their complicated architectures and heavy computational cost hinder their deployments on a mobile device. Some recent efforts in designing lightweight denoising networks focus on reducing either FLOPs (floating-point operations) or the number of parameters. However, these metrics are not directly correlated with the on-device latency. By performing extensive analysis and experiments, we identify the network architectures that can fully utilize powerful neural processing units (NPUs) and thus enjoy both low latency and excellent denoising performance. To this end, we propose a mobile-friendly denoising network, namely MFDNet. The experiments show that MFDNet achieves state-of-the-art performance on real-world denoising benchmarks SIDD and DND under real-time latency on mobile devices. The code and pre-trained models will be released.
translated by 谷歌翻译
旋转速度是要测量的重要指标之一,用于校准制造中的电动机,在汽车维修期间监视发动机,电气设备上的故障等。或在现实世界应用程序方案中使用不便。在本文中,我们提出了通过在移动设备上有效的动态视觉传感的基于事件的转速表。通过将动态视觉传感器作为一种新的传感模式引入动态视觉传感器,将EV-TACH设计为高保真和方便的转速表,以在各种现实世界中精确地捕获高速旋转。通过设计一系列的信号处理算法定制,用于移动设备上的动态视觉感测,EV-TACH能够从旋转目标上的动态视觉传感产生的事件流中准确提取旋转速度。根据我们的广泛评估,EV-TACH的相对平均绝对误差(RMAE)高达0.03%,在固定测量模式下与最先进的激光转速计相当。此外,EV-TACH对于用户手的微妙运动具有鲁棒性,因此可以用作手持设备,在该设备中,激光转速计无法产生合理的结果。
translated by 谷歌翻译